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must be created at highly exact scales, necessitating difficult, costly, and 
Keywords: : ; , : ee 
time-consuming fieldwork. Deep learning algorithms have now significantly 
Sentinel; : , ; ; 
Ada Marck enhanced outcomes when using data in the geographical and temporal dimen- 
Laver level Fusion? sions, which are essential for agricultural research. The simultaneous avail- 
Multi Layer Perceptron; ability of Sentinel-1 (synthetic aperture radar) and Sentinel-2 (optical) data 
Remote Sensing provides an excellent chance to combine them. Sentinel 1 and Sentinel 2 data 


sets were collected for the Cape Town, South Africa, region. With the use of 
these datasets, we use the fusion technique, particularly the layer-level fusion 
strategy, one of the three fusion procedures (input level, layer level, and deci- 
sion level). Also, we will compare the results before and after the fusion and 
discuss the recommended method for converting from a multilayer perceptron 
decoder to a semi-supervised decoder architecture. The total testing accuracy 
produced by the Ada-Match semi-supervised decoder approach was 80.3%. 
We conduct studies to demonstrate that our methodology not only outperforms 
prior state-of-the-art approaches in terms of precision but also significantly 
decreases processing time and memory requirements. 


1. Introduction in a wide range of high-impact applications as more 
public and commercial entities have access to high- 
quality satellite data. One of these is the classifi- 
cation of crop varieties, which presents a signifi- 
cant challenge to those in charge of agricultural and 
environmental policies. According to Foerster et 
al. (Foerster et al.), crop type prediction is useful for 
managing the food supply and children’s wellness 
in underdeveloped nations, as well as for simulating 
flood damage assessment and water quality. Despite 
the fact that crop maps are only useful during the 
growing season when used as input for crop area 


For efficient agricultural monitoring, accurate crop 
maps must be created during the current growing 
season. Large-scale studies on regional crop dis- 
tribution from year to year have been done by a 
number of organisations, but little is known about 
the dynamics of crop composition and geographic 
range within a season. Understanding how crops are 
dispersed throughout the early phases of develop- 
ment enables timely modification of crop planting 
structure, agricultural management, and decision- 
making. Machine learning techniques may be used 
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projections, hazard prediction, or water consump- 
tion calculations, there has recently been an increase 
in the demand for information on the geographic dis- 
tribution and dynamics of various crop types. Yet, it 
can be difficult to accurately determine crop disper- 
sion, especially early in the growing season. 


One of the most important ecosystems for human 
subsistence is agriculture. Due to the population’s 
rapid expansion, bad farming techniques, a rise in 
pest damage brought on by climate change, the loss 
of fertile land due to human activities like urban- 
isation, and inadequate pest control, agricultural 
resources are under significant supply-side stress. 
We have developed a deep learning architecture that 
appropriately categorises the crop types in each agri- 
cultural area to address this issue. The Sentinel 1 
and 2 (Stendardi et al. Wang et al.) crop groups have 
labels for barley, canola, lucerne/medics, wheat, and 
small grain grazing. To create a model that would 
offer a rapid and accurate approach to categorising 
the crop varieties in croplands, we wish to employ 
deep learning techniques. With this approach, which 
also evaluates the danger of drought, farmers can 
forecast crop yields on diverse land patterns. 


Its simultaneous availability provides a tremen- 
dous chance to integrate Sentinel-1 (synthetic aper- 
ture radar) and Sentinel-2 (optical) data. To pre- 
cisely address the operational requirements of the 
Copernicus program, the European Space Agency 
(ESA) has created a new family of missions dubbed 
Sentinels. Each Sentinel mission is made up of 
a constellation of satellites that both meets the 
requirements for revisit and coverage and supplies 
reliable data for the Copernicus services. These trips 
contain a variety of technical gear, including radar 
and multi-spectral imaging equipment for monitor- 
ing the surface of the earth, the oceans, and the 
atmosphere. 


Radiant ML Hub, an open-source repository for 
machine learning datasets, provided the training 
and testing data sets for Sentinel 1 and Sentinel 2. 
The obtained datasets were meticulously inspected, 
validated, and normalised in order to remove the 
datasets’ noise. A vegetative index (VI) is a spec- 
tral imaging modification of two or more picture 
bands (Lymburner). There are several VIs, many of 
which act in the same way. A number of the indices 
use the inverse connection between red and near- 
infrared reflectance, which is connected to healthy 
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green vegetation. 


To effectively categorise and anticipate the crop 
type, this work combines semi-supervised learning 
algorithms with deep learning methods like pixel 
setting and temporal attention encoder architec- 
ture. This method, as opposed to post-season crop 
mapping, offers the benefit of mapping in-season 
crop types throughout crop growth to improve 
agricultural production management. Crop map- 
ping (Nijhawan et al.) based on high-resolution 
satellite data may address a wide range of relevant 
issues, including crop area estimation, yield fore- 
casts, and drought risk assessment. 


2. Study Area and Datasets 


Radiant ML Hub, an open-source repository for 
machine learning datasets, provided the training and 
testing data sets for Sentinel 1 and Sentinel 2. The 
Sentinels mission family is being developed by the 
European Space Agency (ESA) primarily to meet 
the operational requirements of the Copernicus pro- 
gram. The Radiant Earth Foundation makes vec- 
tor data with restricted dissemination rights acces- 
sible to the Western Cape Department of Agricul- 
ture (WCDOA). Each Sentinel mission is made up 
of a constellation of satellites that both meets the 
requirements for revisit and coverage and supplies 
reliable data for the Copernicus services. Sentinel- 
1 is an all-weather, day-and-night, polar-orbiting 
radar imaging mission that monitors both the land 
and the water. On April 3, 2014, Sentinel-1A was 
launched, and on April 25, 2016, Sentinel-1IB A 
Soyuz rocket launched from the European Spaceport 
in French Guiana carried both into orbit. Sentinel- 
1B’s mission was finished in 2022, and Sentinel- 
1C will launch as soon as it is feasible to do so. 
Sentinel-2 is a polar-orbiting, multispectral, high- 
resolution imaging mission that keeps an eye on the 
landscape. It might show pictures of things like 
vegetation, soil and water cover, interior rivers, and 
coastal areas. Information for emergency services 
may also be supplied via Sentinel-2. Launch dates 
for Sentinel-2A and Sentinel-2B are June 23, 2015, 
and March 7, 2017, respectively. 


The dataset was obtained from the source in an 
adjusted and normalised manner. To put pixel values 
into a single scale while keeping underlying simi- 
larities and differences, the various bands are indi- 
vidually normalised (per picture, date, and chan- 
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nel). In addition to feature values, these sensors 
also preserve the dates on which each box was pur- 
chased. While the acquisition dates for Sentinel-1 
are predetermined, those for Sentinel-2 are subject 
to change because of the cloud reduction method 
that is being implemented. Time Series x Number 
of Channels x Number of Pixels in a Parcel is the 
format used to store all parcels (npz format). This 
method is known to give larger dynamic range char- 
acteristics less weight, and machine learning algo- 
rithms are known to converge more quickly. A 
detailed analysis and visualisation were done on the 
dataset. Between May 2017 and March 2018, aerial 
and ground surveys were used to collect crop data 
for the dataset. Its simultaneous availability pro- 
vides a tremendous chance to integrate Sentinel-1 
(synthetic aperture radar) and Sentinel-2 (optical) 
data. The fusion strategy was applied using these 
datasets. 


TABLE 1. Data Properties 


Property Description Parameters 

Name 

crop-_id Crop Class 1, 2,3,4,5 

crop_name Crop Type Wheat, Barely, 
Canola, 
Lucerne, Small 
grain grazing 

fid Field ID Integer 


The datasets for the two training areas and one 
testing region are shown in the images below. 1715 
parcels make up the first training zone, 2436 parcels 
make up the second, and 2417 parcels make up the 
testing region. 


Training Labels 1 


South africa train labels 34S 19€ 258N [1715] 
BW (0) 
i G9 Barley [196] 
ee = GE Canola [231] 
MB Lucerne/Medics [678] 
GB Small grain grazin ig [153] 
iB Wheat (457) 


FIGURE 1. Training Area - 1 


The datasets were also shown for the 34S, 19E, 
258N, and 34S, 259N locations in Cape Town, South 
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Training label 2 


South africa train labels 345 19E 259N [2436] 
( Barley [465] 

[Canola [281] 

GB Lucerne/Medics [1114] 

HM Small grain grazing [280] 

BB Wheat [296] 


B® (0) 


FIGURE 2. Training Area - 2 


Africa, using the Python matplotlib module. The 5- 
day timeseries plot for the Sentinel 1 and Sentinel 2 
datasets was significantly influenced by the vegeta- 
tive index. This collection comprises images of an 
area in South Africa’s Western Cape taken by many 
satellites using multispectral and synthetic aperture 
radar (SAR) (Adeli et al. Chang-An et al.) as well as 
ground-based crop type labels. Five distinct types of 
crops were grown in 2017: lucerne/Medics, canola, 
wheat, barely, and small grain grazing. The AOI is 
made up of three tiles. Two tiles are provided as 
training labels, and one tile will be used to test the 
dataset. The input photos are time series (daily and 
5-day composite) data from Sentinel-2 and Sentinel- 
1. There is a separate collection for each source. 
Also shown here are the five-day time series plots 
for the training regions, Sentinel 1 and Sentinel 2, 
respectively. 


labels [2417] 

WB Barley [192] 

GB Canola [133] 

GB Lucerne/Medics [1368] 
1 Small grain grazing [419] 
GB Wheat [300] 

we (0) 


FIGURE 3. Testing Area 


3. Methodology 


The first inputs of the PSE-TAE encoder architec- 
ture are the Sentinel-1 and Sentinel-2 data sets (Fare 
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34S_19E_258N_34S_19E_258N 


34S_19E_258N_34S_19E_259N 


FIGURE 4. Sentinel 1 Five days time series plot 


34S_19E_258N_34S_19E_258N 


34S_19E_258N_34S_19E_259N 


St mn 


FIGURE 5. Sentinel 2 Five days time series plot 


et al.). We choose the Pixel Set Encoder-Temporal 
Attention Encoder (PSE-TAE) over other supervised 
learning algorithms specifically designed for SITS 
classification as the deep learning architecture for 
studying various fusion methods (Yuan et al.). The 
PSE-TAE architecture is a spatio-temporal classi- 
fier for object-level SITS classification. We assume 
that most European disciplines have access to and 
are familiar with geometry. Three factors led to the 
choice of PSE-TAE: I) It manages different parcel 
sizes and allows irregular time sampling. II) Forma- 
tion of long-term relationships through the process 
of self-awareness (Vaswani et al.). II) Operations 
with reduced memory footprint are more computa- 
tionally efficient. The two main components of the 
system are the spatial coder (pixel set coder) and 
the temporal attention coder. In this case, layer- 
level fusion is also performed after the PSE mod- 
ule using concatenation technology. The time series 
embedding generated by PSE is the same length 
as the input time series. Therefore, Sentinel-1 and 
Sentinel-2 embeddings can be concatenated only if 
their PSEs are of the same length. The input time 
series of Sentinel-1 is resampled using the same 


2023, Vol. 05, Issue 05S 


method used in the initial fusion to adjust the length 
of the output time series of Sentinel-2. It is also pos- 
sible to directly rescale the Sentinel-1 PSE embed- 
ding. However, this makes her PSE module on 
the satellite larger than it needs to be. The classi- 
fication implementation follows a semi-supervised 
learning (Y. C. A. P. Reddy, Pulabaigari, and B. E. 
Reddy) approach. This replaces the multilayer per- 
ceptron decoder (Karami, Attari, and Tavakoli) with 
a new semi-supervised decoder. This type of learn- 
ing uses both labelled and unlabelled data to train 
the system. The amount of labelled data in this 
combination is typically quite small compared to the 
amount of unlabelled data. The basic process is to 
group related data using an unsupervised learning 
algorithm and then use the previously labelled data 
to label the remaining unlabeled data. 


3.1. Pixel Set Encoder 


CNNs have become the industry standard for 
extracting spatial information from images in recent 
years. Our results suggest that convolution may not 
be the best strategy for analysing agricultural land 
in medium-resolution satellite imagery. As men- 
tioned earlier, it is difficult to obtain textural data 
from satellites using the default spatial resolution 
and high return frequencies. Second, to effectively 
train a CNN, the data should be organised into stacks 
of images of equal size (Nowakowski et al.). Due to 
the different packet sizes, this method consumes a 
lot of memory. This corresponds to repeated over- 
sampling of large portions of small parcels to pre- 
vent loss of texture information in large parcels. 
To circumvent these two problems, we developed 
an alternative design called the DeepSet architec- 
ture (Zaheer! et al.) and the Pixel-Set Encoder 
(PSE), inspired by the widely used point-set encoder 
PointNet for processing 3D point clouds. Suggest. 
Instead of using texture information, the network 
computes learned statistical descriptors of the spec- 
tral distribution of pixel data. The S[1-N] set of S 
pixels is randomly selected from the N pixels in the 
parcel. Each randomly selected pixel is repeated to 
match this given size if the total number of pixels 
in the image is less than S.We use the same set S to 
sample all T gatherings of a given parcel. A com- 
mon multilayer perceptron (MLP1) is used to pro- 
cess each sampled pixel. A number of linearly cor- 
rected units, batch norms, and fully connected lay- 
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SENTINEL1 |PARCELS| FEATURE EXTRACTION 
(SAR) r USING ENCODERS 
SENTINEL2 |PARCELS| FEATURE EXTRACTION 

(OPTICAL) | USING ENCODERS 
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FIGURE 6. Methodology for the training process using Semi Supervised Learning 
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FIGURE 7. Pixel Set Encoder 


ers make up its structure. The resulting data sets 
are merged along the S-dimensional pixel axis to 
produce a vector that contains all the statistics for 
the packet and is tolerant of pixel index permuta- 
tion. The geometric attributes (f) that we add to this 
learned feature are the perimeter, the number of pix- 
els N, the coverage factor (N divided by the num- 
ber of pixels in the bounding box), and the ratio of 
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perimeter to surface area of the parcel. This vec- 
tor is used by the MLP2 perceptron to generate the 
spatio-spectral embedding e(t) of the parcel at time 
t. 


3.2. Temporal Attention Encoder 


Self-awareness processes form the basis of TAE. 
The idea of attention is one of his best known in 
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FIGURE 8. Temporal Attention Encoder 


deep learning. The Seq2Seq seq model was used 
for neural machine translation as the main purpose 
of this technique, but has now been extended to 
include photo captioning (Sutskever et al.). A self- 
aware process is used to construct Temporal Atten- 
tion Encoders (TAE) (Niu, Zhong, and Yu). In 
order to build a representation of the sequence, this 
method pays special attention to the links between 
different input sequence positions (time series in 
this case). This technique emphasises connections 
between multiple input sequence positions to deter- 
mine the representation of the sequence, in this case 
the time series. Using a position encoder (based 
on sine and cosine functions) preserves the rela- 
tive position of the sequence and adds this infor- 
mation to the PSE embedding. TAE immediately 
accepts two embeddings added together. Coming 
to RNNs (Sherstinsky), which process data sequen- 
tially, the application of multi-head attention allows 
the model to accept inputs from a large number 


of representational subspaces at different temporal 
positions while allowing computational paralleliza- 
tion and optimization. can be continuously paid 
attention to. A multi-layer perceptron (MLP) is used 
to analyse the generated TAE embeddings and gen- 
erate class logits. 


3.3. Fusion 


The three basic fusion techniques (Ofori-Ampofo, 
Pelletier, and Lang) are input-, layer-, and decision- 
level fusion processes. The best fusion technique 
to improve the classification performance of opti- 
cal radar is the layer-level fusion performed in this 
work. The Pixel-Set Encoder-Temporal Attention 
Encoder (PSE-TAE) (He, Chow, and J.-D. Zhang 
Fiorini, Ciavotta, and Maurino) is a state-of-the- 
art architecture that uses it, specifically created for 
object-based classification of satellite imagery time 
series (SITS), and is self-aware. Based on the atten- 
tion process, this limitation is overcome by using 
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FIGURE 9. Layer level fusion architecture 


a pixelset encoder in your design. These encoders 
use randomly selected samples of pixels to provide 
a learned statistical descriptor of the spectral dis- 
tribution of the parcel data. Pixels are processed 
to construct the spatio-spectral embedding of each 
datum using a general continuous MLP. Attention 
mechanisms are another way deep neural networks 
can selectively focus on certain relevant informa- 
tion and ignore other information. TAE is based 
on a process of self-awareness. Layer-level fusion 
technology fuses data by concatenating two separate 
PSE-TAE network embeddings after the TAE mod- 
ule. The MLP classifier follows concatenation. The 
process is easy because both embeddings have the 
same size. 


3.4. Fix Match Learning Algorithm 


FixMatch (Sohn et al.) integrates pseudo-labelling, 
consistency correction and SSL (semi-supervised 
learning) techniques. These two components are 
combined, and separate weak and strong increments 
are used to adjust consistency, which is one of its 
characteristics (H. Zhang et al.). Let X = (xb, pb) be 
the set of examples of an L-class classification prob- 
lem, denoted by B, where xb denotes the training 
examples and pb the features of one hot label. Let 
U = ub: b (1,..., B) be the set of unnamed instances 
of B and let B be a hyperparameter that controls the 
relations between X and U . Let pm(y, x) be the pre- 
dicted class distribution of the model for input x. 
The letter H represents the cross entropy between 
the probability distributions of p and q (p, q). Mod- 
ern state-of-the-art SSL algorithms are essential for 
consistency and regularity. Basing its assumptions 
on the idea that a model should produce results that 
are comparable when given transformed copies of 
the same image, the consistency correction utilises 
unlabeled data. The method, called pseudo-labeling, 


exploits the model’s inherent ability to assign false 
labels to unlabeled data. This is particularly related 
to the use of “hard” labels (ie, the arg-max output 
of the model) and limiting the use of false labels 
to those with the highest probability greater than 
a certain threshold. The FixMatch loss function 
consists of two entropy loss terms, called uncon- 
trolled loss (’u’) and controlled loss (’s”), applied 
to labelled data. Specifically, it is essentially a 
cross-entropy loss that occurs in samples with weak 
enhanced labelling. FixMatch creates a false label 
for each unlabeled example, which is used for a typ- 
ical cross-entropy loss. Using the equation qb = 
pm(y — ub), we first calculate the class distribu- 
tion predicted by the model for the unlabeled image 
with weak enhancement .Cross-entropy loss is then 
applied to the model output, resulting in a greatly 
improved version of UB, where qb acts as a pseudo 
stamp. 


3.4.1. Match Learning Algorithn 


The Mix-Match (David et al.) method corrects each 
labelled data point once per set and corrects each 
unlabeled data point K (hyperparameter) times. For 
each K improved record, the model is asked to pre- 
dict the L class probability, and its average is then 
used as the prediction for all K records. This aver- 
age is modified to reduce entropy before the final 
prediction is made. W is created by merging and 
reordering enhanced labelled and unlabeled data. 
The amount of tagged data in the set is considered 
when combined with the first —X— to create the 
X’ and W elements. The unsigned data in the array 
is combined with the remaining elements of W to 
create U . Model X’s prediction should match the 
labelled result, while Model U’s prediction should 
match the unlabeled estimates because lambda is 
0.5 and MixUp prefers the first point over the sec- 
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FIGURE 10. Fix Match Semi supervised algorithm 
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FIGURE 11. Mix Match Semi supervised algorithm 


ond. Since we are aware of the exact output of the 
labelled data, the cross entropy loss H is a suitable 
loss function for Lx(a,b). The total loss is finally 
obtained by combining the losses with the hyperpa- 
rameter lambda. 


3.5. Ada Match Learning Algorithm 


AdaMatch (Berthelot et al.) combines three tech- 
niques to deal with inconsistencies between the 
source and target distributions: random logit inter- 
polation, a relative confidence threshold, and modi- 
fied distribution alignment. 

The algorithm is implemented as follows: 

There were originally two augmentations—one 
weak and one strong—created for each input. The 
input is then split into two batches: one batch con- 
tains solely source input, while the other comprises 
both source (labelled) and target (unlabeled) data. 
These two batches are then processed by the model 
to create logits. The logits from a batch are affected 
by its own batch norm statistics. To accomplish con- 
sistency regularisation, the source logit and the logit 
obtained from the mixture are mixed using random 
logit interpolation. 


3.5.1. Random logit interpolation: 


Using random logit interpolation, the source 
domain’s joint batch statistics are randomly added to 
the mixture. As a result, more typical batch statistics 
for both domains are produced. 


3.5.2. Distributive alignment: 


In real-world machine learning applications, domain 
shifts in the training data are a concern since 
they take place when the data originates from sev- 
eral sources. In spite of these changes, a good 
ML model, for instance, developed via learning a 
domain-invariant representation, should continue to 
function effectively. 


The distribution of the class predictions can be 
more closely matched to the actual distribution by 
using distributive alignment. Without it, the classi- 
fier can only predict which class will be the most 
common or show different failure modes. If we 
knew the target label distribution, we would use it 
right away. When the destination label distribution 
is unknown, the only distribution that is available 
is the source label distribution. The class with the 
highest level of certainty is used to assign pseudo- 
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FIGURE 12. Ada Match Semi supervised algorithm 


labels to these outputs. 
3.5.3. Relative confidence threshold: 


Particularly for non-distributed data, ML models are 
not well optimised. The relative confidence thresh- 
old is changed based on the classifier’s level of trust 
in the weakly supplemented source data and the 
user-provided confidence threshold. Adamatch is 
preferable to fix match and mix match because it 
uses distributive alignment and consistency regular- 
isation (fix match). 


4. Results and Discussion 


The key benefit of semi-supervised learning over 
the other two is that it allows us to enhance the 
performance and generalizability of our model. 
Large datasets (particularly for business reasons) 
could only include a few labels since labels are 
costly. We can work with these kinds of datasets 
using semi-supervised learning without having to 
choose between supervised learning and unsuper- 
vised learning. The Ada Match algorithm outper- 
formed the other two semi-supervised 

algorithms in terms of training validation and test- 
ing accuracy. Consequently, the semi-supervised 
decoder has been successfully implemented after 
fusion in place of the multilayer perceptron decoder. 
Our research demonstrates that our methodology 
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significantly decreases processing time and mem- 
ory requirements while outperforming prior state- 
of-the-art approaches in terms of precision. The fol- 
lowing shows the outcomes: 


The AdaMatch semi-supervised decoder’s testing 
accuracy is presented in the table above after the var- 
ious training and validation accuracy levels. The 
PSE-TAE (Pixel Set Encoder and Temporal Atten- 
tion Encoder) and layer-level fusion designs were 
used. By training Sentinel 1 and Sentinel 2 individ- 
ually, 66 and 67 percent accuracy for both processes 
were attained without the need for layer-level fusion. 
When layer-level fusion was used, the accuracy rose 
to 94 percent, and the graph shows that the accuracy 
increases as the number of epochs increases. Subse- 
quently, semi-supervised decoders took the place of 
the traditional multi-layer perceptron decoder. We 
looked at and used Fix match, Mix match, and Ada 
match as semi-supervised decoders. The fix match 
algorithm obtained 93 percent training and valida- 
tion accuracy, followed by mix match (94.2 per- 
cent), and Ada match (94.8 percent). Ada Match 
outperforms Mix Match, even if there is a little gap 
in the accuracy of the two approaches. Finally, the 
model was assessed using the testing dataset using 
the Pixel Set-Temporal Attention classifiers and the 
Ada Match semi-supervised decoder. It produced an 
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TABLE 2. Training- Validation and Testing Accuracies 


Sentinel Sentinel Data fusion Data Fusion (Semi Supervised Decoder) 
1 2 (MLP decoder) 
Fixmatch Mixmatch Adamatch Testing 

Fl- 0.51 0.56 0.93 0.92 0.93 0.94 0.72 
SCORE 
IOU 0.38 0.43 0.88 0.86 0.87 0.90 0.72 
Overall 0.65 0.66 0.94 0.93 0.94 0.95 0.80 
Accuracy 
Kappa 0.51 0.52 0.92 0.91 0.92 0.93 0.70 
Coefficient 


accuracy of 80.3%. This is a diagram of the resultant 
confusion matrix: 


132 h 16 19 


5 


FIGURE 13. Confusion matrix after testing pro- 
cess 


In the above confusion matrix / heatmap (Nétek, 
Pour, and Slezakova), the labels are numbered from 
1-5.A heatmap is a colour-encoded matrix represen- 
tation of rectangular data. It accepts a 2D dataset 
as a parameter. This dataset may be transformed 
into an array. This is an excellent approach to depict 
data since it may highlight the relationship between 
variables such as time. Label | represents wheat, 2 
represents barley, 3 represents canola, 4 represents 
lucerne, and 5 represents minor grain crops. Hence, 
by comparing the different accuracy levels before 
and after fusion, it has been proven that after imple- 
menting the layer-level fusion technique, the model 
can perform better with higher accuracy and can pre- 
dict crop types accurately. 


5. Conclusion 


The issue of large-scale management of agricultural 
plots is crucial from both a political and economic 


perspective. Deep learning algorithms have now sig- 
nificantly enhanced outcomes when using data in 
the geographical and temporal dimensions, which 
are essential for agricultural research. Because the 
fusion approach can get around issues with both the 
Sentinel 1 and Sentinel 2 datasets, such as the lower 
number of bands in Sentinel 1 photos and shadow 
coverings and cloud/smog impediments in Sentinel 
2 images, it has been noted that the model’s perfor- 
mance may be improved. The combination of pub- 
licly available satellite data from the sentinel satel- 
lites, with cutting-edge remote sensing techniques 
can give cost-effective, accurate, and rapid informa- 
tion on crop extent and dynamics. PSE-TAE, a deep 
learning architecture that exploits both the spatial 
and temporal aspects of the dataset, is used in this 
work to harmonise Sentinel-1 and Sentinel-2 time 
series for crop type mapping in Cape Town, South 
Africa. PSE-computational power and TAE’s effi- 
ciency enables quick evaluation of various model 
setups. To improve the performance in majority and 
minority classes, combined Sentinel-1 and Sentinel- 
2 modalities are helpful. Several types of fusion are 
recommended depending on the availability of class 
samples. Any type of fusion is sufficient in the case 
of classes with high representation, but layer-level 
fusion offers additional benefits. We went through 
a point where switching from the multi-layer per- 
ceptron to the semi-supervised classifier was pretty 
challenging. Later we were able to successfully 
replace the decoder and deduce the logic to replace 
it .The PSE-TAE system may therefore prove to be 
extremely helpful in resolving current farming and 
agricultural issues. 
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